Subword Units for a Mandarin Keyword Spotting System

نویسنده

  • Chi-Yan CHOY
چکیده

This paper is concerned with the problem of phonetic modeling in a Mandarin keyword spotting system. The task is to detect 20 keywords from continuous speech in the Call Home corpus from the Linguistic Data Consortium (LDC). Different speech units are explored, including whole word, syllable, and demi-syllable (INITIAL and FINAL). In our speaker-independent HMM-based Mandarin keyword spotting experiments, the keyword spotter based on base-syllable keyword models has achieved the best performance. The best spotting accuracy achieved is 83.8% with 9.8 FA/KW/H. In the second part of our study, keyword spotting with different numbers of general filler models (389, 182, 37 and 1 fillers) has been performed in an effort to reduce computation time and increase flexibility.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing decoding strategies for subword-based keyword spotting in low-resourced languages

For languages with limited training resources, out-ofvocabulary (OOV) words are a significant problem, both for transcription and keyword spotting. This paper investigates the use of subword lexical units for keyword spotting. Three strategies for using the sub-word units are explored: 1) converting word-based lattices to subword lattices after decoding, 2) performing a separate decoding for ea...

متن کامل

Morphological Segmentation for Keyword Spotting

• We explore the impact of morphological segmentation on Keyword Spotting (KWS). ! • Handling out-of-vocabulary (OOV) words is a major challenge in KWS we aim to alleviate this problem by utilizing sub-word units.! • We augment a state-of-the-art KWS system with subword units derived from supervised and unsupervised morphological segmentations, and compare with phonetic and syllabic segmentatio...

متن کامل

Cross-word sub-word units for low-resource keyword spotting

We investigate the use of sub-word lexical units for the detection of out-of-vocabulary (OOV) keywords in the keyword spotting task. Sub-word units based on morphological decomposition and character ngrams are compared. In particular, we examine the benefit of sub-word units that cross word boundaries. Experiments are performed on the IARPA Babel Turkish dataset. Our results demonstrate that cr...

متن کامل

An Investigation of Subword Unit Representations for Spoken Document Retrieval

This study investigates the feasibility of using subword unit representations for spoken document retrieval as an alternative to using words generated by either keyword spotting or word recognition. Our investigation is motivated by the observation that word-based retrieval approaches face the problem of either having to know the keywords to search for a priori, or requiring a very large recogn...

متن کامل

Coalescence Type based Confidence Warping for Agglutinative Language Keyword Spotting

In agglutinative languages like Korean, words are formed by joining l affix morphemes to the stem, which leads to high OOV rate in dictionary building. Hence, subword units are usually used as basic language modeling units in Large-Vocabulary Continuous Speech Recognition (LVCSR) or LVCSR based applications such as keyword spotting. In this work, firstly a new word property called coalescence t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998